Performance of Parallel K-Means Algorithms in Java

نویسندگان

چکیده

K-means is a well-known clustering algorithm often used for its simplicity and potential efficiency. Its properties limitations have been investigated by many works reported in the literature. K-means, though, suffers from computational problems when dealing with large datasets dimensions great number of clusters. Therefore, authors proposed experimented different techniques parallel execution K-means. This paper describes novel approach to which, today, based on commodity multicore machines shared memory. Two reference implementations Java are developed their performances compared. The first one structured according map/reduce schema that leverages built-in multi-threaded concurrency automatically provided streams. second one, allocated available cores, exploits programming model Theatre actor system, which control-based, totally lock-free, purposely relies threads as coarse-grain “programming-in-the-large” units. experimental results confirm some good performance can be achieved through implicit intuitive use However, better guaranteed modular implementation proves more adequate an exploitation resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Analysis of Parallel K Means and Parallel Fuzzy C Means Cluster Algorithms

In this paper, we give a short review of recent developments in clustering. Clustering is the process of grouping of data, where the grouping is established by finding similarities between data based on their characteristics. Such groups are termed as Clusters. Clustering is a procedure to organizing the objects into groups or clustered together, based on the principle of maximizing the intra-c...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

Irregular Parallel Algorithms in JAVA

The nested data-parallel programming model supports the design and implementation of irregular parallel algorithms. This paper describes work in progress to incorporate nested data parallelism into the object model of Java by developing a library of collection classes and adding a forall statement to the language. The collection classes provide parallel implementations of operations on the coll...

متن کامل

Pthread Parallel K-means

K-means is a popular non-hierarchical method for clustering large datasets. The time requirements increase linearly with the size of the data set which make it particulary suited for extremely large datasets such as those found in digital libraries. The method was developed by MacQueen [4] in 1967. In our project we take a uniprocessor k-means algorithm and implement a parallel k-means algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2022

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a15040117